Application N<k: 10/514,413 

AMENDMENTS TO THE CLAIMS 

Please enter the following amendments: 

1 . (Currently Amended) An apparatus for determining, based on speech waveform data, 
a portion representing a feature of the speech waveform, comprising: 

an acoustic/prosodic analysis unit which calculates extracting means for calculating , from 
said data, a distribution of energy of a prescribed frequency range of said speech waveform along 
a time axis, and e xtracting extracts , among various syllables, a first portion of said speech 
waveform that is generated stably by a source of said speech waveform, based on the distribution 
of energy and pitch of said speech waveform; 

a cepstral analysis unit which calculates e stimating m e ans for calculating , from said data, 
a frequency spectrum distribution of said speech waveform along the time axis, and e stimating 
estimates , based on the frequency spectrum distribution, a second portion of said speech 
waveform, for which change is well controlled by said source; and 

a pseudo-syllabic center extracting unit which determines m e ans for determining the 
portion representing the feature of said speech waveform based on the first portion extracted by 
said extracting means the sonorant energy calculating unit and the second portion estimated by 
said e stimating means the cepstral analysis unit, wherein 

said cepstral analysis unit includes: 

a linear prediction analysis unit which performs linear prediction analysis on said 
speech waveform and outputting an estimated value of formant frequency; 
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a cepstral distance calculating unit which calculates, using said data, a distribution 
of cepstral distance on the time axis based on the estimated value of formant frequency provided 
by said linear prediction analysis unit; 

an inter-frame variance calculating unit which calculates, based on an output from 
said linear prediction analysis unit, distribution of local variance of magnitude of delta cepstrum 
of said speech waveform on the time axis; and 

a reliability center candidate output unit which estimates, based both on said 
distribution on the time axis based on the estimated value of formant frequency calculated by 
said cepstral distance calculating unit and on said distribution on the time axis of local variance 
of magnitude of delta cepstrum of said speech waveform calculated by said inter-frame variance 
calculating unit, a range in which change in the speech waveform is well controlled by said 
source . 
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2. (Currently Amended) The apparatus according to claim 1, wherein 
said extracting means aeoustie/prosodic analysis unit includes: 

a pitch determining unit which determines voic e d/unvoiced d e termining means 
for determining , based on said data, whether each segment of said speech waveform is a voiced 
segment or not, 

a dip detecting unit which separates means for s e parating said speech waveform 
into syllables at a local minimum of said waveform of energy distribution of the prescribed 
frequency range of said speech waveform on the time axis; and 

a voiced/energy determining unit which extracts m e ans for e xtracting that range 
of said speech waveform which includes, in each syllable, an energy peak in that syllable within 
the segment determined to be a voiced segment by said voiced/unvoiced det e rmining m e ans 
pitch determining unit and in which the energy of the prescribed frequency range is not lower 
than a prescribed threshold value. 

3. (Canceled) 

4. (Currently Amended) The apparatus according to claim 1, wherein 
said pseudo-syllabic center extracting unit d e termining means includes: 

means for determining a range included in the range extracted by said e xtracting 
means acoustic/prosodic analysis unit within the range of which change in speech waveform is 
estimated by said estimating m e ans cepstral analysis unit to be well controlled by said source. 

5-6. (Canceled) 
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7. (Currently Amended) An apparatus for determining a portion representing a feature 
of a speech signal, comprising: 

a linear prediction analysis unit which performs predicting m e ans for p e rforming linear 
prediction analysis on said speech signal; 

a cepstral distance calculating unit which calculates first calculating means for 
calculating , based on an estimated value of formant provided by said linear prediction analysis 
unit predicting means and said speech signal, a distribution of cepstral distance , along time axis, 
based on the estimated value of formant; 

an inter-frame variance calculating unit which calculates second calculating means for 
calculating , based on a result of the linear prediction analysis by said linear prediction analysis 
unit predicting means , a distribution , along time axis, of a variance of local spectral change 
magnitude of delta cepstrum in said speech signal along the time axis ; and 

m e ans for e stimating a reliability center candidate output unit which estimates , based on 
the distribution based on the estimated value of formant calculated by said first calculating 
means cepstral distance calculating unit and the distribution of variance of local spectral chang e 
magnitude of delta cepstrum in said speech waveform calculated by said s e cond calculating 
means inter-frame variance calculating unit , a portion of said speech waveform in which a 
change in said speech waveform is well controlled by said source. 

8. (Currently Amended) A program product causing, when execut e d on a computer, said 
computer machine readable medium having data stored thereon, the data, once read by the 
machine, causing the machine to operate as an apparatus for determining, based on speech 
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waveform data, a portion representing a feature of the speech waveform, said apparatus 
comprising: 

an acoustic/prosodic analysis unit which calculates extracting means for calculating , from 
said data, distribution of energy of a prescribed frequency range of said speech waveform along a 
time axis, and extracting, among various syllables, a first portion of said speech waveform that is 
generated stably by a source of said speech waveform, based on the distribution of energy and 
pitch of said speech waveform; 

a cepstral analysis unit which calculates estimating m o ans for calculating , from said data, 
a frequency spectrum distribution of said speech waveform along the time axis, and estimating, 
based on the frequency spectrum distribution, a second portion of said speech waveform, for 
which change is well controlled by said source; and 

a pseudo-syllabic center extracting unit which determines moans for determining the 
portion representing a feature of said speech waveform based on the first portion extracted by 
said extracting means the sonorant energy calculating unit and the second portion , wherein 
said cepstral analysis unit includes: 

a linear prediction analysis unit which performs linear prediction analysis on said 
speech waveform and outputting an estimated value of formant frequency; 

a cepstral distance calculating unit which calculates, using said data, a distribution 
of cepstral distance on the time axis based on the estimated value of formant frequency provided 
by said linear prediction analysis unit; 

an inter-frame variance calculating unit which calculates, based on an output from 
said linear prediction analysis unit, distribution of local variance of magnitude of delta cepstrum 
of said speech waveform on the time axis; and 
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a reliability center candidate output unit which estimates, based both on said 
distribution on the time axis based on the estimated value of formant frequency calculated by 
said cepstral distance calculating unit and on said distribution on the time axis of local variance 
of magnitude of delta cepstrum of said speech waveform calculated by said inter-frame variance 
calculating unit, a range in which change in the speech waveform is well controlled by the 
source . 

9. (Currently Amended) The program product machine readable medium according to 
claim 8, wherein 

said extracting m e ans acoustic/prosodic analysis unit includes: 

a pitch determining unit which determines voiced/unvoiced det e rmining m e ans 
for det e rmining , based on said data, whether each segment of said speech waveform is a voiced 
segment or not, 

a dip detecting unit which separates means for separating said speech waveform 
into syllables at a local minimum of said waveform of energy distribution of the prescribed 
frequency range of said speech waveform on the time axis; and 

a voiced/energy determining unit which extracts means for e xtracting that range 
of said speech waveform which includes, in each syllable, an energy peak in that syllable within 
the segment determined to be a voiced segment by said voic e d/unvoiced det e rmining means 
pitch determining unit and in which the energy of the prescribed frequency range is not lower 
than a prescribed threshold value. 

10. (Canceled) 

7 
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1 1 . (Currently Amended) The program pro du ct machine readable medium according to 
claim 8, wherein 

said pseudo-syllabic center extracting unit determining m e ans includes: 

means for determining a range included in the range extracted by said e xtracting 
means acoustic/prosodic analysis unit , within the range of which change in speech waveform is 
estimated by said e stimating m e ans cepstral analysis unit to be well controlled by said source. 



12. (Canceled) 



13. (Currently Amended) A program product causing, wh e n executed on a computer, 
r,aid computer machine readable medium having data stored thereon, t h e data, once read by the 
machine, causing the machine to operate as an apparatus for determining a portion representing a 
feature of a speech signal, said apparatus comprising: 

a linear prediction analysis unit which performs predicting moans for performing linear 
prediction analysis on said speech signal; 

a cepstral distance calculating unit which calculates first calculating means for 
calculating , based on an estimated value of formant provided by said linear prediction analysis 
unit predicting means and said speech signal, a distribution of cepstral distance along time axis 
on the estimated value; 

an inter-frame variance calculating unit which calculates second calculating means for 
calculating , based on a result of the linear prediction analysis by said linear prediction analysis 



WDC99 1702179-1.071109.0014 



8 



Application No.: 10/514,413 

unit predicting m e ans , a distribution along tim e axis of a variance of local sp e ctral change 
magnitude of delta cepstrum in said speech signal along time axis ; and 

means for estimating a reliability center candidate output unit which estimates , based on 
the distribution based on the estimated value of formant calculated by said first calculating 
means cepstral distance calculating unit and the distribution of variance of local spectral change 
magnitude of delta cepstrum in said speech waveform calculated by said second calculating 
moans inter-frame variance calculating unit , a portion of said speech waveform in which a 
change in said speech waveform is well controlled by said source. 

14. (Currently Amended) a method nf Hntnrmining. based on extracting from a speech 
waveform data[[,]] a portion representing a feature of the speech waveform, comprising the steps 
of: 

calculating, from said data, a distribution of energy of a prescribed frequency range of 
said speech waveform along a time axis, and extracting, among various syllables, a first portion 
of said speech waveform, that is generated stably by a source of said speech waveform, based on 
the distribution of energy and pitch of said speech waveform; 

calculating, from said data, a frequency spectrum distribution of said speech waveform 
along the time axis, and estimating, based on the frequency spectrum distribution, a second 
portion of said speech waveform, for which change is well controlled by said source; and 

extracting d e t e rmining the portion representing a feature of said speech waveform based 
on the first portion extracted in said extracting step and the second portion, wherein 

said estimating step includes: 
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performing linear prediction analysis on said speech waveform and outputting an 
estimated value of formant frequency; 

calculating, using said data, a distribution of cepstral distance on the time axis 
based on the estimated value of formant frequency provided in said step of outputting the 
estimated value; 

calculating, based on the calculated distribution based on the estimated value of 
formant frequency, distribution of local variance of magnitude of delta cepstrum of said speech 
waveform on the time axis; and 

estimating, based both on said calculated distribution on the time axis related to 
the estimated value of formant frequency and on said calculated distribution on the time axis of 
local variance of magnitude of delta cepstrum of said speech waveform, a range in which change 
in the speech waveform is well controlled by said source . 

15. (Currently Amended) The method according to claim 14, wherein 
said step of extracting step a first portion of said speech waveform includes the steps of: 
determining, based on said data, whether each segment of said speech waveform 
is a voiced segment or not, 

detecting a local minimum of said waveform of energy distribution of the 
prescribed frequency range of said speech waveform on the time axis, and separating said speech 
waveform into syllables at the local minimum; and 

extracting that range of said speech waveform which includes, in each syllable, an 
energy peak in that syllable within [[the]] a segment determined to be a voiced segment by said 



WDC99 1702179-1.071109.0014 



10 



App licati o n No.; 10/514,413 

voic e d/unvoiced det e rmining means and in which the energy of the prescribed frequency range is 
not lower than a prescribed threshold value. 



16. (Canceled) 

17. (Currently Amended) The method according to claim 14, wherein 

said det e rmining step of extracting the portion representing a feature of said speech 
waveform includes the step of: 

determining, as a portion of said speech waveform, a range included in the range 
extracted in said extracting step, within the range of which change in speech waveform is 
estimated in said estimating step to be well controlled by said source. 

18-22. (Canceled) 

23. (Currently Amended) An apparatus as recited in claim 1, wherein 
said e stimating means cepstral analysis unit is configured to calculate includes means for 
calculating , from said data, a frequency spectrum distribution of said speech waveform along the 
time axis, and estimating estimate the second portion, based on the frequency spectrum 
distribution, as a portion where local variance of changes of the frequency spectrum is at a local 
minimum. 
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24. (New) An apparatus as recited in claim 1 , wherein 
said cepstral distance calculating unit includes: 

a cepstrum re-generating unit connected to receive said estimated value of 
formant frequency from said linear prediction analysis unit, for recalculating cepstrum 
coefficients based on said value of formant frequency; and 

a logarithmic transformation and inverse discrete cosine transformation unit 
connected to receive said speech waveform data for calculating FFT cepstrum coefficients based 
on said waveform data, wherein 

the cepstral distance calculating unit is configured to calculate cepstrum distance 
between the cepstrum coefficients recalculated by said cepstrum re-generating unit and the FFT 
cepstrum coefficients calculated by said a logarithmic transformation and inverse discrete cosine 
transformation unit, said cepstrum distance indicating a distribution of unreliability; and 
said cepstral analysis unit includes: 

a standardizing and integrating unit which combines the cepstrum distance and 
the distribution on the time axis of local variance of spectral change and outputting a combined 
data, wherein 

the reliability center candidate output unit estimates the range in which change in 
the speech waveform is well controlled by said source at a dip of the combined data output by 
said standardizing and integrating unit. 
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25. (New) The machine readable medium according to claim 8, wherein 
said cepstral distance calculating unit includes: 

a cepstrum re-generating unit connected to receive said estimated value of 
formant frequency from said linear prediction analysis unit, for recalculating cepstrum 
coefficients based on said value of formant frequency; and 

a logarithmic transformation and inverse discrete cosine transformation unit 
connected to receive said speech waveform data for calculating FFT cepstrum coefficients based 
on said waveform data, wherein 

the cepstral distance calculating unit is configured to calculate cepstrum distance 
between the cepstrum coefficients recalculated by said cepstrum re-generating unit and the FFT 
cepstrum coefficients calculated by said a logarithmic transformation and inverse discrete cosine 
transformation unit, said cepstrum distance indicating a distribution of unreliability; and 
said cepstral analysis unit includes: 

a standardizing and integrating unit which combines the cepstrum distance and 
the distribution on the time axis of local variance of spectral change and outputting a combined 
data, wherein 

the reliability center candidate output unit estimates the range in which change in 
the speech waveform is well controlled by said source at a dip of the combined data output by 
said standardizing and integrating unit. 
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26. (New) The method according to claim 14, wherein 
said step of calculating a distribution of energy includes: 

receiving said estimated value of formant frequency, and recalculating cepstrum 
coefficients based on said value of formant frequency; 

receiving said speech waveform data for calculating FFT cepstrum coefficients 
based on said waveform data; and 

calculating cepstrum distance between the recalculated cepstrum coefficients and 
the FFT cepstrum coefficients, said cepstrum distance indicating a distribution of unreliability; 
and wherein 

said estimating step further includes: 

combining the cepstrum distance and the distribution on the time axis of local 
variance of spectral change and outputting a combined data; and 

estimating the range in which change in the speech waveform is well controlled 
by said source at a dip of the combined data. 
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